Usages of Built-in Experiments

In this page, we will introduce the details of the built-in experiments classes, which include classes of inductive/transductive semi-supervised learning with or without graph.

The Process of Built-in Experiment Class

We first introduce the main process of the experiment, which includes load dataset, data manipulate, hyper-parameters selection and model evaluation. Before the experiemnts start, you’ll be asked to configure the datasets, evaluation metrics, estimators and their candidate parameters. Then, the experiments classes help you finish the whole process without much attention. You should set the estimators, metrics and the datasets to run the experiments.

We provide two built-in experiment classes: SslExperimentsWithoutGraph, SslExperimentsWithGraph. The common steps to run the experiment is:

  • Initialize an instance of experiments class
  • Set the estimators to evaluate with append_configs method
  • Set the datasets with append_datasets
  • Set the evaluation metrics with set_metric and append_evaluate_metric methods
  • Run

Five Steps to Run the Experiments

Here is an example:

from s3l.Experiments import SslExperimentsWithoutGraph
from s3l.model_uncertainty.S4VM import S4VM

# list of (name, estimator instance, dict of parameters)
configs = [
        ('S4VM', S4VM(), {
            'kernel': 'RBF',
            'gamma':[0],
            'C1': [50,100],
            'C2': [0.05,0.1],
            'sample_time':[100]
        })
    ]

# list of (name, feature_file, label_file, split_path, graph_file)
datasets = [
    ('house', None, None, None, None),
    ('isolet', None, None, None, None)
    ]

# 1. Initialize an object of experiments class
experiments = SslExperimentsWithoutGraph(transductive=True, n_jobs=4)
# 2. Set the estimators to evaluate with `append_configs` method
experiments.append_configs(configs)
# 3. Set the datasets with `append_datasets`
experiments.append_datasets(datasets)
# 4. Set the evaluation metrics with `set_metric`
experiments.set_metric(performance_metric='accuracy_score')
# optional. Additional metrics to evaluate the best model.
experiments.append_evaluate_metric(performance_metric='zero_one_loss')
experiments.append_evaluate_metric(performance_metric='hamming_loss')
# 5. Run
results = experiments.experiments_on_datasets(unlabel_ratio=0.75,test_ratio=0.2,
    number_init=2)

During the initialization, you should first choose a class from SslExperimentsWithoutGraph, SslExperimentsWithGraph based on using graph or not. Besides, you should specify the semi-supervised scheme as inductive (set transductive``=``False) or transductive (set transductive``=``True) and deside how many cores of CPU you want to use.

Then, you should call append_configs, append_datasets and set_metric in any order to configure the experiments:

  • append_configs takes in a list of tuples like (name, object, parameters_dict). name is a string as you like, object is the object of an estimator, parameters_dict is a dict whose keys are name of parameters for corresponding estimator, values are lists of candidate values.
  • append_datasets takes in a list of tuples like (name, feature_file, label_file, split_path, graph_file). name is a string used for output; feature_file, label_file, split_spath, graph_file can be string or NoneType, which should be the absolute path of the file you provided. If you use built-in datasets, feature_file and label_file can be None; If split_file is None, experiments class will split the data every time you run; graph_file should be set when the experiment need a graph.
  • set_metric configures the evaluation metric used in hyper-parameters selection. The best model is selected based on this metric. [Here is a list of supported metrics](http). Please note that parameter metric_large_better indicates whether the metric is larger better.
  • append_evaluate_metric appends other metrics which would be used to evaluate the best model selected in hyper-parameters selection. [Here is a list of supported metrics](http)

Attention

  1. In order to reduce repetitive codes, we define some protocols to follow for estimators and metrics. You can refer to How to Implement Your Own Estimators for more details.
  2. When debugging, you should close parallel mode by setting n_jobs to 1 else your code won’t stop at the breakpoint.
  3. If the built-in experiment process doesn’t meet your demands, you can design yours own settings (refer to How to Design Your Own Experiments and Experiments).